AITopics | hinge loss

We study linear regression and classification in a setting where the learning algorithm is allowed to access only a limited number of attributes per example, known as the limited attribute observation model. In this well-studied model, we provide the first lower bounds giving a limit on the precision attainable by any algorithm for several variants of regression, notably linear regression with the absolute loss and the squared loss, as well as for classification with the hinge loss. We complement these lower bounds with a general purpose algorithm that gives an upper bound on the achievable precision limit in the setting of learning with missing data.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback

a10946e1f46e1ffc0daf37cb2abfdcad-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 05:39:10 GMT

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Preliminaries

Neural Information Processing SystemsApr-25-2026, 02:42:42 GMT

Given a T-periodic function f(x), its Fourier series representation is a decomposition into a sum of Fourier basis functions.

artificial intelligence, coefficient, sketch, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

11afefdd848d1bc9ac9f1604d9f45817-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:50:45 GMT

artificial intelligence, robot, seditor, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Supplemental Materials: AConsolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction ATechnical Proofs

Neural Information Processing SystemsApr-24-2026, 08:13:40 GMT

C.2 Consolidated CV with random features Alternatively, one can use random features (Rahimi and Recht, 2007) to approximate the kernel matrix. Suppose that we consider shift-invariant kernels that satisfy K(x,y) = K(x y). In this work we use the radial kernel K(x,y) = exp( σ x y 22). The kernel can be approximated by K(x,y) φ(x),φ(y), where an explicit randomized feature mapping φ: IRp IRm is obtained by sampling from a distribution defined by the inverse Fourier transformation.

artificial intelligence, inequality, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

Farné, Gabriele, Boncoraglio, Fabrizio, Zdeborová, Lenka

arXiv.org Machine LearningMar-27-2026

A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.

artificial intelligence, generalization error, machine learning, (18 more...)

arXiv.org Machine Learning

2603.25579

Country:

North America (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.33)

Industry:

Health & Medicine (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Benign overfitting in leaky ReLU networks with moderate input dimension

Neural Information Processing SystemsMar-19-2026, 21:08:49 GMT

The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega(n)$.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback